Measuring Performance when Positives are Rare

نویسندگان

  • S. H. Muggleton
  • C. H. Bryant
چکیده

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomskylike grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the positive-only learning framework of CProgol. Performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. The highest RA was achieved by a model which includes grammar-derived features. ThisRA is significantly higher than the best RA achieved without the use of the grammar-derived features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Performance when Positives are Rare : Relative Advantage versus Predictive

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predic-tors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CPro-gol is used to generate a grammar for recognising a class o...

متن کامل

Measuring Performance when Positives Are Rare: Relative Advantage versus Predictive Accuracy - A Biological Case Study

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accurate comprehensible predic tors of members of biological sequence families The positive only learn ing framework of the Inductive Logic Programming ILP system CPro gol is used to generate a grammar for recognising a class of ...

متن کامل

An Integrated Approach for Measuring Performance of Network structure: Case study on power plants

Data envelopment analysis (DEA) and balanced scorecard (BSC) are two well-known approaches for measuring performance of decision making units (DMUs). BSC is especially applied with quality measures, whereas, when the quantity measures are used to evaluate, DEA is more appropriate. In the real-world, DMUs usually have complex structures such as network structures. One of the well-known network s...

متن کامل

Improved Procedure for Screening Expression Libraries for Novel Autoantigens

The standard method for immunoscreening of a cDNA expression library is time-consuming becauseof the production of a large proportion of false positives during the first and second round of screening.This problem is more important when a sensitive chemiluminescence detection system is used. Due tothe high sensitivity of the detection system, there is a need to avoid false posi...

متن کامل

Context-Dependent Data Envelopment Analysis-Measuring Attractiveness and Progress with Interval Data

Data envelopment analysis (DEA) is a method for recognizing the efficient frontier of decision making units (DMUs).This paper presents a Context-dependent DEA which uses the interval inputs and outputs. Context-dependent approach with interval inputs and outputs can consider a set of DMUs against the special context. Each context shows an efficient frontier including DMUs in particular l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000